INFORMATION  ANALYSIS  OF  LINEAR  INTERACTIONS  IN 
CONTINGENCY  TABLES 

BY 

S.  KULLBACK  and  D.  V.  GOKHALE 


TECHNICAL  REPORT  NO.  9 
AUGUST  15,  1977 


PREPARED  UNDER  GRANT 
DAAG29-77-G-0031 

FOR  THE  U.S.  ARMY  RESEARCH  OFFICE 


Reproduction  in  Whole  or  in  Part  is  Permitted 
for  any  purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited. 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


Information  Analysis  of  Linear  Interactions  In 
Contingency  Tables 


% 

S.  Kullback  and  D.  V.  Gokhale 


TECHWICAIi  REPORT  NO.  9 
August  15,  1977 


Prepared  under  Grant  DAAG29-77-G-~0031 
For  the  U.S.  Army  Research  Office 

Herbert  Solomon,  Rroject  Director 


Approved  for  public  release}  distribution  unlimited. 


DEPARTMENT  OF  STATISTICS 
STANFORD  UKEVERSITY 
STANFORD,  CALIFORNIA 


Partially  supported  under  Office  of  Naval  Research  Contract  N000l4-76-C-0475 
(NR-042-267 ) and  issued  as  Technical  Report  No.  249. 


THE  FINDINGS  IN  THIS  REPORT  ARE  NOT  TO  BE 
CONSTRUED  AS  AN  OFFICIAL  DEPARTMENT  OF 
THE  ARM!  POSITION,  UNLESS  SO  DESIGNATED 
BY  OTHER  AUTHORIZED  DOCUMENTS. 


INFORMATION  ANALYSIS  OF  LINEAR  INTERACTIONS  IN  CONTINGENCY  TABLES 


S.  KULL3ACK 

The  George  Washington  University 
D.V.  GOKHALE 

University  of  California,  Riverside 


1.  INTRODUCTION 

The  purpose  of  this  article  is  to  illustrate  the  use  of  the 
minimum  discrimination  information  (MDI)  approach  in  studying 
null  hypotheses  of  no  linear  interactions  in  contingency  tables 
of  "one  response  many  factors”  type.  In  such  contingency  tables, 
the  data  can  be  looked  upon  as  a collection  of  as  many  multi- 
nomial experiments  as  there  are  factor-level  combinations  and 
each  experiment  has  a number  of  cells  equal  to  the  levels  of  a 
response  variable.  Qne  formulation  of  a "no  linear 
Interaction"  hypothesis 

is  that  the  cell  probabilities  of  the  response  variable  can  be 
expressed  as  linear  funetions-of  parameters  which  are-  structurally 
less  complex.  For  accounts  of  different  formulations  of 

no-inter' a-^;t ion  hypotheses  and  related  references,  the  reader 
is  referred  to  Bhapkar  and  Koch  [1968]  , Darroch  [197^0  • 

The  "no  linear  interaction"  hypotheses  can  be  formulated  as 
linear  constraints  on  the  underlying  probabilities,  written  in 
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matrix  notation  as  Bp  =0.  It  is  possible  to  apply  J4DI 
analysis  to  obtain  estimates  of  cell  frequencies  and  test 
various  hypotheses  and  sub-hypotheses.  If  the  hypotheses 
are  "nested"  the  MDI  statistic  for  the  stronger  hypothesis 
(which  imposes  more  constraints)  can  be  analyzed  into  two 
components,  one  measuring  disparity  between  the  observed 
distribution  and  the  weaker  hypothesis  and  the  other 
measuring  disparity  between  the  estimated  distributions 
under  the  two  hypotheses . This  feature  of  the  MDI 
statistics  is  not  enjoyed  by  the  chi-square—type  or  Wald- 
type  statistics  used  by  many  authors . 

For  the  sake  of  clarity  of  presentation,  we  will 
restrict  ourselves  to  the  hypotheses  of  no  linear  second 
order  interaction  in  a 2x2x2  table  and  in  a 4x2x2  table. 

This  enables  us  to  compare  results  with  Bhapkar  and  Koch 
[1968] , who  have  viewed  two  sets  of  data  as  of  the 
"one  response  many  factors"  type.  The  analysis  of  a 2x2x2 
table  shows  how  the  use  of  an  approximation  in  the  MDI 
statistic  leads  to  a statistic  used  by  Bhapkar  and  Koch 
[1968]  . The  4x2x2  tab3.e  is  analysed  under  two  hypotheses 
of  no  linear  interaction  of  second  order-,  illustrating  the 
analysis  of  information  mentioned  in  the • preceding  paragroph 

2.  GENERALITIES 

For  a three-way  rxsxt  table  in  which  the  first  variable 
is.  a response  and  the  other  two  variables  are  factors,  one 
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formulation  of  no  linear  second  order  interaction  is  given 
by 

(2.1)  : p(ijk)  = p(i..)  t p(ij.)  + y(i.k),  i=l,...,r, 

iJ 

3v““  m » f t f 

where  the  p(ijk)  are  subject  to  the  constraints 

(2.2)  Z^_^p(ijk)  = 1,  for  each  fixed  pair  (jk)  , 

and  the  parameters  y depend  only  on  the  indicated  indices.  The 
hypothesis  is  equivalent  to  the  follov;ing  (r-1)  x (s-1)  x (t-1) 
constraints  in  addition  to  those  in  (2.2): 

(2.3)  p(ijk)  - p(ijt)  - p(isk)  + p (ist)  = 0, 

i l/.../(r"*l)^  ^ l^»..^(s*“l)^ 
k*“  1^...^  (l-*“j.)  . 

Writing 

(2.4)  £=  (p (111) ,p (211) , . . . ,p (rll) ,p (112) , . . . ,p (rst) ) ' , 

where  the  (jk)  indices  are  in  lexicographic  order,  the 
constraints  (2.2)  and  (2.3)  can  be  expressed  as 

(2.5)  = i. 

v/here  the  vector  _G_  consists  of  the  first  st  elements  equal  to 
unity  cind  the  remaining  elements  equal  to  zero.  This  is 
illustrated  in  the  examples  below. 

Let  x(ijk)  denote  the  observed  frequency  in  the  (ijk)-th 
cell  and  x denote  a vector  similar  to  p of  (2.4) . Also  let 
x(.jk)  denote  the  total  number  of  obserwitions  for  the  (j]:)-th 
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factor  combination  and  let  N = ^ ^ * Basic  to  the 

information  analysis  is  the  discrimination  information 
function 

(2.6)  Kp-.tt)  = E?  ^S^^^w(jk)Sj^^p(ljK)-^n[p(lJk)/TT(ljk)l 


where  w(jk)  = x(.jk)/N.  The  vector  tt  is  similar  to  £;  it  is 
an  arbitrary  collection  of  st  probability  distributions,  each 
on  r cells.  It  is  assumed  that  x(ijk)  , p(ijk)  and  ir(ijk)  are 
positive  for  all  (ijk) . The  choice  of  u depends  on  the 
analysis  at  hand.  When  it  is  desired  to  assess  the  departure 
of  the  data  from  an  external  hypothesis  (as  is  the  present 
case)  , ]T  is  taken  to  be  the  vector  of  observed  proportions  in 
each  of  the  st  factor  combinations , 


The  MDI  estimates  x* (ijk)  = Np* (ijk)  are  such  that  the 
discrimination  information  (2.6)  is  minimized  subject  to  the 
constraints  ^ = £.  In  other  words,  p* (ijk)  is  the  distribution 
which  satisfies  the  hypothesized  constraints  (2.5)  and  is 
"closest"  (in  the  MDI  sense)  to  the  observed  distribution. 

There  are  several  convergent  iterative  computer  algorithms 
for  obtaining  x* (ijk) . One  is  described  in  the  Appendix. 

The  MDI  statistic  2I(x*;x)  = 2NI(p*:tt)  has  a chi-square 
distribution  in  large  samples  with  degrees  of  freedom  equal  to 
rank(B)  - st. 
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3..  THE  2x2x2  TABLE 

Consider  the  probabilities  of  a 2 x 2 x 2 contingency 
table  (Table  1)  . 

Table  1 


B 

1 

3 

j=2 

C k=l 

II 

C k=l 

r-a 

II 

i*!  A p(lll) 

P(112) 

p(121) 

p(122) 

i=2  a p(211) 

P(212) 

i P(221) 

1 

p(222) 

The  experimental  procedure  selects  a fixed  number  of 
observations  under  the  four  possible  combinations  of  the 
factors  (B,3)f  (C,y)  and  determines  the  number  of  occurrences 

of  (A, a)  for  each  case.  In  effect  then  the  procedure  is 
examining  four  binomials  with 

(3.1)  p(ljk)  t p(2jk)  = 1,  j=l,2,k  = 1,2. 

The  corresponding  observed  values  are  shown  in  table  2, 

It-  is  desired  to  test  whether  the  observed  values  are  consistent 
with  a null  hypothesis  of  no  interaction  on  a linear  scale, 
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Table  2 




L 

1 

k=l 

k=2 

k=l 

k=2 

1=1 

x(lll) 

x(112) 

x(121) 

x(122) 

1=2 

x(211) 

x(212) 

, 

x(221) 

x(222) 

X ( . 11) 

x(.12)  ■ 

x(.21) 

x(.22) 

that  is 

Hq:  p(lll)  - p(112)  = p(121)  - p(122) 

(3.2)  or  p(lll)  - p(112)  - p(121)  + p(122)  = 0. 

We  shall  determine  estimates  for  the  cell  entries  subject 
to  the  null  hypothesis  and  compare  the  estimated  and  observed 
values.  The  estimated  table  is  given  in  tcible  3 where  the 
A's  are  to  be  determined. 

Table  3 


• o 

3=2  . 

k=l 

k=2 

k=l 

k=2 

x(lll)  +x^ 

x(112)+A2 

x(121)+A2 

x(122)+A^ 

x(211)-A^ 

x(212)-A2 

x(221)-X3 

X (222) -A^ 

x(.ll) 

x(.12) 

x(.21 

x(  .22) 

We  shall  use  the  principle  of  minimum  discrimination 
information  estimation  and  thus  determ.ine  the  A's  which 


minimize 
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f (x(111)+Xt  ) (x(211)-XJ  ^1 

x(lll)  “■  x(211)  “ 

+ (x(112)+Ao)  (x(212)-X^) 

x(212) 

+(x(121)+A^)  (x(2.21)-Ao)  ^ ^ ~^3 

^ “irrrjrr  “ru^r 

+ (x(122)-i-A  J (x(222)-Aj 

x(122)  X (222 ) 

^ x(lll)+Aj  x(112)+A„  x(121)+Aj  x(122)-l-A. 

V xTTm  x(Ti2)“  x(.2i)  " rri22T^ 


where  x is  a Lagrange  undetermined  multiplier  and  (3.2)  is 
reflected  by  the  condition 

(3.4)  x(lll)+A^  x(112)+A2  x(121)+A2  x(122)+A^^q 

x(.ii)  irrriT)  ^^Taiy  ' x(.22‘) 

Differentiating  (3.3  ) with  respect  to  A^^,...,A^  leads 
to  the  "normal"  equations 


In  ^1  , T _ 

jTTijJT”  x(Ynr  - xT.li)  ' 


x(112)+A2  £n^‘^212)-A2 


I x(112) 
(3-5)^  xa2W+X_3 

x(l21) 


x(212)  ■■  xf.'lY)  " ' 

x(221)-A3  ^ _ Q ^ 

"x(221)~  ~ x(.21) 


= 0 , 


^n  >^n.22)-fA,  ,^x(222)-A, 


x(122) 


1^X222 


X ( .22) 


There  are  a number  of  different  iterative  approaches  to  determine 
the  solution  to  (3.5)  but  our  i.nterest  here  isj  to  examine  the 
relation  of  an  approximate  solution  to  other • proposed  methods. 
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Assuming  that  the  ratios  of  X's  to  the  observed  values 
are  small,  v/e  use  the  appiroximations 


,^x(lll)+Xi 

x(lll) 


x(lll) 


in  -(211) -X^ 
F(Tll) 


x,(211) 


, etc . j 


in  (3.5)  and  get 

/ 


(3.C)\ 


^1 

- 4- 

^1 

T 

-1- 

II 

o 

il 

x(.ll) 

T 

_L 

x(iiiF 

-y- 

x(211) 

x( .11) 

x-(lll)  x(211) 

^ x(.ll)  ' 

■^2 

+ 

^2 

T 

= 0 =Xj 

x(  .12) 

T 

*“  / 

X (112) 

x'dl2) 

x(.12) 

>rriT2l  x(2T2T 

X ( . 1 2 ) 

^3 

■ + 

^3 

T 

= 0 =X3 

X ( . 2 1 ) 

'T* 

t. 

x(121) 

x(221) 

x(.21) 

x(121  x(221) 

x(.21)  ' 

^4 

- -1- 

^4 

T 

1 

II 

o 

II 

x(.22) 

T 

}. 

X (122) 

x(222) 

^ xT.22) 

xir2  2)  x (2  22) 

^ x(  .22)  •' 

From  (3.6)  and  (3.4)  we  have,  introducing  the  notation 

/N  /S  /\  /S 

x(lij)  = x(.ij)p(ij),  x(2ij)=  x ( .i  j ) q (i  j ) , p (ij  ) -!-  q(ij)  = 1, 

, _ x(lll)x(211)  , = - p(ll)q(ll)T  , 

^ (x(.ll))^ 


(3.7),' 


^ 9 = x(112)x(212) 

(x(.12))^ 


p(12)q(12)T  , 


Ys  = x(121)x(221)  ^ = p(21)q(21)T  , 

(x(.21)  ) ^ 

^4  = - X (122)  X (222)  ^ - p(22)q(22)T  , 

(x(.22)) 

T ^ P (-U)  - P(12)  - p(21)  + p(22) 

pdpqdl)  ^ p(12  }qa.P.y.p(21)S(21)  , p(2  2)g(22) 


V, 


x(.ll) 


x(.12) 


X(.21) 


x(.22) 


Let  us  write 


(3.8) 


X*(lll)  = x(lll)  + X^,  x*(211)  = x(211)  - , 

x'*(112)  = x(112)  + A2,  x*(212)  = x(212)  - Ap  , 


etc . 

where  the  X's  satisfy  (3.5). 
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If  we  also  use  the  quadratic  approximations 

or/  ^ 0 x(lll)tA,  x(211)-A, 

2{  (x(lll)+A^)  In  ^ __!  + (x(211)-Aj_)  in 


. 2 , 1 


x(lll) 

1 


} 


1 ^ x(lll)  x(211)^  X (111)  X (211) 


x(  .11) 


x( .ll)p(ll)^(ll) 

then  we  get  for  the  minimum  discrimination  information  statistic 
2l(x*:x)=2Z  Z Z x*(ijk)in 

X V J.  3 K ; 

^^2  ^ p(ll)g(ll)  . p(12)q(12)  , p(21)J(21)  p(22)^(22)) 

■ ^ x(.ll)  x(.12)  x(’.21)  x(.2*2)  S 


(3.10) 


-(p(ll)-p(12 ) -p(21)'-!-p(22 ) ) 


p(ll)q(ll)  p(12)q(12)  p(21)q(21)  y p(22)a(22) 

x(.ll)  >:(.12)  x(.21)'’  ' ■ x(.22) 


• ^1  ^’Cllll  ■x  C211).^tA2  ^■(;112)  ^ xt2X2)  ^xTT 


C122)  ' x(222) 

2 


-) 


Note  that  the  last  value  in  (3.10)  is  the  modified  Koyman  x 

C3.ll)  ^ _(obs-exp)_ 

1 


1 


obs 


and  indeed  the  equations  in  (3.6)  arc  those  to  determine  the 
minimum  modified  x estimates.  The  next  to  last  value  in  (3.10)  rs 
the  statistic  gj.ven  by  }3hapJcar  and  Koch  [1968,^  p.  116]  based  on  a 
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criterion  due  to  Wald,  The  square  root  of  this  value  is  the 
statistic  used  by  Snedecor  and  Cochran  [1967,  p.  496]. 


In  accordance  with  the  minimum  discrimination  information 
theorem  (Kullback  [1959])  the  log-linear  representation  for 
X*  (ijk)  is  given  graphically  as  in  figure  1 where  the  in- 
terpretation is 


*■"  h - Vx(.ii)  , 


fn  X*  (211) 


x(lll) 


L 


1 ' 
2 


fn  x^  (212 )_ 

x{212)  ^2 

• • • • 

In  X*  (222)  _ L. 
x(2  22l 


Recalling  (3.8)  we  see  that  (3.12;  in  fact  leads  to  (3.5) 
If  we  wx'ite 


(111) 

® mT)- 


X* (112) 
X ( . 12  ) 


X* (121) 

“ X ('.  21T 


X* (122) 
X ( . 2 ^ 


=p*  (11) -p*  (12) -p*  (21)  + 


(3.13) 


e= 


x(lll) 

x(.ll) 


x(112) 
X ( . 12 ) 


x(121)  , 

X ( . 2 IT 


x(122) 

x(.T?y 


-p  ( 11 ) -p  ( 1 2 ) -p ;( 2 1 ) -i-p  (22), 


then  as  shown  in  Kullback  [1959,  p.  101-106] 
(3.14)  2I(x*tx)  1 (0*-0)^/o^  , 


^2 

where  a is  determined  as  follov7s.  Let  T denote  the  8x5  matrrx 


(22)  , 


in  figure  1,  that  is, 
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(3.15)  T= 


1 

1 

0 

0 


0 0 
0 0 


1 

1 


0 

0 

0 


0 0 


-l/x( .11) 
0 

■H/x(.12) 

0 


0 0 1 Q +l/x(.21) 


0 0 
0 0 
0 


1 
0 

0 0 


0 
■ 1 
1 


0 

-l/x(,22) 

0 


and  D the  8x8  diagonal  matrix  with  entries  x(ijk) , that  is, 


(3.16)  D = 

—X 


x(lll)  P 
0 x-(2il) 


x(112) 


X ( 212) 


x(121) 


x(221) 


x(122) 


0 , . . • • x(222) 

Compute  the  5x5  matrix  S = T'D  T and  partition  it  as  follows 

MB  — 


(3.17)  S 


---11  "12 


c;  c; 
-21  -2  2 


' -11 


4x4,  £22  Ixi, 


-21  " -’12  ^ 


2 . 


then  o'  in  (3.14)  is  given  by 
(3.18)  0^  = S22  -S2t  ' 


It  may  be  verified  that  this  results  in 

" 2 ^ x(lll)x(21 1 ) X ( 3 12)  x (212)  , x_(  3._23y  L h ^SSJ-12-J  ^11?-  ? '> 

(3.19)  ° ..  - ..  - - - - - 

(x(.ll))*^  (x(.12))"^  (x(.21))-'  (x(.22)) 


p(ll)g(ll)  . p(12)^(12)  p(21)q(21)  , p(2^2)n(2_2)  . 

— ifniir'  ■*'  ift:i2)  ^ xC2i)-~ ■ •'xc.Yirr 
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But  0*  in  (3.13)  is  zero  and  we  see  that  (3.14)  is  indeed  the 
next-to-last  value  in  (3.10).  It  is  interesting  to.  note  that 
2I(x*:x)  can  be  approximated  without  necessarily  computj.ng 
the  values  of  x*(ijk) . 


Note  now  that  in  order  to  express  the  hypothesis  of 
(3.2)  in  the  form  B £-  we  can  let 


£=  (p(lll),  p(^^-4/  p{.112),  p(.212),  p(121),  p(221), 

p(  129  , p(222)  )', 


(3.20) 

■ B= 


110  0 0 
0 0 110 

0 0 0 0 1 

0 0 0 0 0 

1 0-1  0-1 


and 

(3.21)  0=  (1,1, 1,1,0)  ' 


0 

0 

1 

0 

0 


0 0 
0 0 
0 0 
1 1 
1 0 


Figure  1 


i j k 

Li  L3  T 

111 

1 -l/x(.ll) 

2 11 

1 

112 

1 . +l/x(.12) 

2 12 

1 

12  1 

1 +l/x(.21) 

2 2 1 

1 

12  2 

1 -l/x(.22) 

2 2 2 

1 
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Wg  shall  illustrate  the  preceding  discussion  by  Bartlett's 
data  on  root  cuttings  used  also  as  an  example  by  Snedecor 
and  Cochran  [1967],  Bhapkar  andKocli  [1968],  Berkson  [1972]. 

fue  following  Table  4 from  Bartlett  [1935] , who  refers 
to  data  from  Hoblyn  and  Palmer,  is  the  result  of  a.n  experiment 
designed  to  investigcite  the  propagation  of  plum  root  stocks 
from  root  cuttings.  There  were  240  cuttings  for  each  of  the 
four  treatments . 


Table  4 . 


At  Once 

In  Spring 

j = 

1 

j = 

^2 

Long 

Short 

Long 

Short 

k=l 

il 

k=l 

k=2 

Dead.  1=1 

8 4 

133 

156 

i 

209  ! 

1 

Alive  i=2 

156 

107 

84 

31  1 

240 

240 

240 

2 40  ! 

By  using  the  B and  _0  defined  in  (3.20)  and  (3.21)  and  tlie 
iterative  algorithm  described  in  the  Appendix  the  MDI  estimates 
x*(ijk)  of  the  cell-frcquencies  are  obtained,  as 

82.883  134.213  157.117  ' 208.448 

157.117  105.787  82.883  31.552 

They  agree  v.'ithi.n  round-off  ercrors  with  those  obtciined  by 
Berkson  [1972].  The  MDI  stcitistic  2I(x*:x)  equals  0.0819  with 


one  D . F . 


3.4 


^ . TI-IE  4x2x2  TABLE 


Analysis  of  hypotheses  of  no  linear  interaction  in  a 
4x2x2  table  is  illustrated  by  Schotz ' s data  Table  5 on  drivers 
in  injury  producing  accidents,  taken  from  Table  III  of  Bhapkar 
and  Koch  [1368] , who  regard  accident  severity  as  response  emd 
the  other  tv/o  classifications  as  factors . 


Table  5 . 


1 

Accident 
Severity  (i) 

Minor 

Moderate 

Moderately 

Severe 

O ’9 

Severe 

to 

Extreme 

r • — 
Total 

Driver  Group 
(k) 

^’^ridit" 
Accident-(,r) 
Type  (j)  \ 

.05 

•f 

.33 

■»71 

. 

.93 

2588 
10  479 

Lone  Driver 

Rollover 

, 

Non-rollover 

21 

99C 

567 

5454 

1356 

2773 

644 

1256 

! 

Sub-total  |l017 

' 

6021 

4129 

1900 

13067 

Injured  Drive: 
v/ith 

Passengers 

: Rollover 
Non-rollovea 

18 

■ 

679 

553 

4561 

1734 

2516 

869 

1092 

. 

317'- 

n n / 

■ o i;^  o 

Sub -total 

i 697 

5114 



425  0 

19  6.1 

12  02  2 

Total 

i 

|1714 

11135 

8379 

38G1 

2 5 0 0 9 

' 
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II 

J.>Gt  US  ignore  t'he  nuiuorical  severity  "ridit  scores  r^ , i=l,...4  and 
consider  the  hypothesis  of  no  linear  second  order  interaction  formulated 
in  (2.3),  The.  13  matrix  is 

Cell  index:  111  211  311  411  112  212  312  412  121  221  321  421  122  222  322  42'i 

IJ.  10000  0 0000000 

0001  1 1.  1.0  000000  0 

000000011  1.  1 000  0 

(4.1)  B=  0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 

“ 1000  -1  000  -1  0001000 

0 1 0 0 0 -1  0 0 0 -1  0 0 0 1 0 0 

00  1 000  -1  000  -1  00010 

and 

(4.2)  0 ■■=  (i, 1,1, 1/0, 0,0)  ' . 

Using  the  algorithm  desc.r.ihed  in  the  Appendix,  the  MDI  estiraates  of 
cell  frequencies  corae  out  to  be 


x’^-'  (111)  = 

27.32 

X-'-  (211)  = 

531.50 

x--M311)  = 

1359.16 

x'-^ 

II 

1 — 1 
r^ 

670.02 

x'^  (121)  = 

932.32 

X*  (221)  = 

5535.91 

X*  (321)  = 

2768.49 

X* 

(421)=  1242.2;- 

X* (112)= 

14.59 

X-'’'  (212)  = 

583.70 

x* (312)= 

1733.23 

X* 

(412)  = 

842. 4C 

X* (122)= 

734.45 

X*  (222)  = 

4884.30 

X* (322)= 

2522.48 

X'^ 

(422)=  110  6. 7-; 

The  14DI  statistic  2I(x*:x)  with  3 d.f.  is  19.703,  v/hich  is  significant  at 
level,  shov/ing  that  the  data  do  not  support  the  hypothesis  of  no 
linear  second  order  interaction  as  given  by  (2.3). 

It  is  interesting  to  examine  here  the  hypothesis  of  no  linear 
second  order  interaction  with  respect  to  average  "ridits"  considered 
by  Bhapkar  and  Koch  [19  68]  . The  hypothesis  is 

Hi  : [p  (ilk) -p (i2k) ] = A,  k=l,2, 

where  A is  a constant.  This  is  equivalent  to  A^^  - A^  = 0 . The 
5x16  matrix  corresponding  to  has  the  same  first  4 rows  as  B and 


the  fifth  row  is 


The  vector  equals  (1 , 1 , 1 , 1 ,0 ) ' . 

The  MDI  estimates  jc*^(ijk)  of  cell  frequencies  are  given  belov/: 

x*(lll)=  19.96  x*(211)=  551.09  x*  ( 311)  = 1359 . 62  X-M411)-  657.3.: 

xf(121)=  1004.55  xf(221)=  5470.00  xf(321)=  2759.92  xn421)=  1244,5' 

x^Mll2)=  18.80  x?f(212)=  566.69  xf(312)=  1732.79  xv(412)=  855  .7. 

x-'*(122)  671.88  x''(222)=  4543.52  x^'M  322)  = 25  29 . 15  x;|' (422)  = 1103 . 41 

1 1 1-1- 

The  MDI  statistic  2I(x*:x)  is  1.980  v/ith  1 d.f.  This  should  be  compared 

with  the  value  2.02  obtained  by  Bhapkar  and  Koch  [1968]  for  their 

Wald- type  statistic. 

Nov/  observe  that  is  implied  by  the  stronger  hypothesis 
given  by  (2.3),  since  the  fifth  row  of  can  be  expressed  as  a 
linear  combination  of  ,rov/s  of  B.  To  see  this  let  B (h)  denote  the 
h-th  row  of  B of  (4.1) , then 

B.  (5)-  r.,  B(5)+r„B(6)+r.,B(7)+r  , [B  (1) -B  (2 ) -B  ( 3) +B  ( 4) -3  (5) -B  ( 6) -B  ( 7)  ] . 

X X / J ij; 

Hence  we  Ccin  analyze  the  information  2l(x*:x)  as  follov/s: 


Analys.is  of  Inforrcati 

.on 

Compionent  due  to 

Information 

D.F. 

Chi-.square 

H 

o 

2I(x*:x)=  19.703 

3 

7.815 

2I(x*:x;  ) ==17. 723 

2 

5.991 

«1 

2I(x*:x)=-  1.980 

1 

3.841 
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¥e  see  that  the  data  do  not  provide  statistically  significant 
evidence  against  the  hypothesis  H]_  of  no  second-order  interaction 
with  respect  to  average  "ridits".  In  other  words,  this  hypothesis 
does  explain  the  departure  from  the  hypothesis  (2.3)  of  no  linear 
second-order  interaction. 

Further  analysis  of  these  data  can  be  done  in  two  ways;  In 
terms  of  "ridit"  values  and  in  terms  of  the  non-quant itative 
contrasts  among  p(ijl7.)  given  by  the  last  three  rows  of  the  matrix 
B of  (4.1) . 


"Rldlts”  ; Note  that  the  data  are  not  consistent  with  the 
hypothesis  of  no  linear  second-order  interaction  (2l(x* :x)=19.703, 

3 d.f.),  while  they  can  be  regarded  as  consistent  with  the  hypothesis 
Hi[_  of  equality  of  means  of  the  ’'ridit”  values  (r3_,  rg,  r3,  r^)  of 
the  four  distributions  (2l(x*  :x)  >=1.980,  1 d.f.).  The  remaining 
two  degrees  of  freedom  can  be  associated  respectively  with  the 
hypotheses  of  equality  of  second  and  third  moments  of  the  "ridit” 
values,  the  hypothesis  of  equality  of  means  and  second  moments 
(which  is  equivalent  to  the  hypothesis  of  equality  of  means  and 
variances)  corresponds  to  a 6x16  ma,trix,  , say,  which  has  the 
first  five  rows  as  in  B^^  and  the  sixth  row  is 

( , r| , r| , -r2  , -r| , -r| , -r| , -rf  , -if  , -r| , -r|  , if  , r| , r^  , r^  ) 
and  03=(1, 1,1, 1,0,0)  ' . 


Under  the  MDI  statistic  2l(Xg  :x)  comes  out  to  be  10.036. 
The  difference  10.036-1.980=8.056  is  the  contribution  due  to  the 
additional  constraint  in  B^  as  compared  to  B^ , assignable  to 
equality  of  variances.  Finally  the  difference  19.703-10.036=9.667 
is  the  contribution  due  to  equality  of  the  third  moments  in 
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additlon  to  the  equality  of  the  first  two  moments.  Since  each  of 
these  differences  is  asymptotically  a chi-square  with  one  degree 
of  freedom  , we  conclude  that  though  there  is  no  significant 
second-  order  linear  interaction  with  respect  to  mean  "ridits", 
there  appears  to  he  a significant  contribution  due  to  heterogeneity 
of  the  second  and  third  moments  of  the  four  "ridit"  distributions. 

Non -quantitative  approach.  A different  line  of  analysis  treats 
the  response  variable  (accident  severity)  as  a qualitative  variable 
ignoring  "ridit”  values.  In  this  case,  since  the  overall  hypothesis 
of  no  linear  second-order  interaction  leads  to  a significant  MDI 
statistic  (21  :x)=19 .703,  3 d.f.)  it  may  be  of  interest  to 

examine  which  of  the  three  constraints  (given  by  the  last  three 
rows  of  the  matrix  B in  (4.1))  contribute  significantly  to  2I(x^*‘;x). 
For  this  purpose  , we  set  up  several  B-matrices  omitting  one  or 
two  rows  from  the  last  three  rows  of  (4.1)  each  time.  For  example, 
the  B-matrix  without  the  seventh  row  corresponds  to  the  (weaker) 
hypothesis 

p(lll)-p(112)-p(121)+p(122)=0, 

^3* 

p ( 211) -p ( 212) -p ( 221) +p ( 222) =0 . 

Implicit  in  Hg  is  the  third  constraint 
[p(311)+p(4ll)l-[p(312)+p(4l2)]-[p(321)+p(421)]+[p(322)+p(422)l=0. 
Hence  H3  tests  no  linear  second-order  interaction  v;ith  respect 
to  levels  1 and  2 combining  levels.  3 and  4 of  the  response.  Note 
that  under  these  weaker  hypotheses  the  MDI  statistics  will  give 
a value  not  larger  than  I9.703.  The  analysis  is  summarized  in 
Table  6. 
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Omitting  rows  5 and  6 of  the  matrix  B of  (4.1)  corresponds 
to  the  hypothesis  of  no  linear  second-order  interaction  in  a 
2x2x2  table  with  level  3,  pooling  all  the  remaining  levels.  This 
is  the  only  hypothesis  with  which  the  data  are  consistent.  Thus 
it  appears  that  levels  1 and  2 of  accident  severity  both  jointly 
and  separately  account  for  a major  (significant)  contribution 
towards  the  presence  of  a linear  second-order  interaction. 

Table  6 also  indicates  a way  of  reducing  categories  in  a 
contingency  table  with  the  inherent  qualities  of  the  observed 
data  least  affected.  Thus  if  the  given  4x2x2  table  is  to  be 
reduced  to  a 3x2x2  table,  this  should  be  done  by  combining 
levels  3 and  4.  Similarly,  if  a 2x2x2  table  is  required  as 
a partial  summary  of  the  4x2x2  table  one  should  examine  all 
the  possible  ways  of  pooling  the  levels  of  the  response  variable 
and  select  the  way  in  which  the  maximum  contribution  to  the  linear 
second-order  interaction  is  retained.  The  possible  ways  are  level 
(1).  against  (2)+(3)+(4),  (2)  against  (l)+(3)+(4), 

(3)  against  (l)+(2)+(4),  (l)+(2)  against ' (3) +(4) , 

(l)+(3)  against  (2)+(4),  and  (l)+(4)  against  (2)+(3). 

The  MDI  statistics  corresponding  to  the  first  three  combinations 
are  given  in  table  6 as  the  three  entries  11.803,9-750,  and  0.032 
respectively.  To  find  the  MDI  statistics  corresponding  to  the 
remaining  three  combinations  one  can  add  the  last  two  rows  of  the 
B-matrices  when  rows  7,  6,  5,  are  omitted  one  at  a time.  This  gives 
the  MDI  statistics  as  3-517,  1-538,  and  8.078  respectively.  The 
largest  of  these,  MDI  statistics  is  11.803,  showing  that  levels 
2,  3,  and  4 should  be  pooled  and  level  1 be  retained  in  the 


2x2x2  table. 
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The  analysis  above  shows  that  levels  1 and  2 are  the  main 
contributors  to  the  departure  from  the  hypothesis . of  no  linear 
second-order  Interaction. 


Table  6 

MDI  Statistics  Under  Different  B-matrlces 
Operation  on  rows  of  (4.1) MDI  statistic D . Fv 


Delete 

(7) 

18.385 

2 

Delete 

(6) 

12.125 

2 

Delete 

(5) 

13.188 

2 

Delete 

(6),  (7) 

11.803 

1 

Delete 

(5),  (7) 

9.750 

1 

Delete 

(5),  (6) 

0.032 

1 

Delete 

(7),  add 

(5)  and  (6) 

3.517 

1 

Delete 

(6),  add 

(5)  and  (7) 

1.538 

1 

Delete 

(5),  add 

(6)  and  (7) 

8.088 

1 
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APPENDIX 

Described  below  is  an  algorithm  to  obtain  x* (ijk)  = Np* (ijk) 
v/hich  minimize  the  discrlmine.tioninf ormation  function  (2.6)  subject 
to  the  constraints  Bo  = where  p is  as  in  (2.4)  . (Gokhale  [1974]) . 

With  w(jk)  = x(.jk)/N,  multiply  the  first  r elements  of  o by 
w(ll)  f the  second  r eieirients  by  w(12)  , and  so  on.  The  vector  so 
obtained  can  be  written  as  where  W-  is  a diagonal  matrix,  the 

entries  in  the  first  r diagonal  positions  being  w(ll) , those  in  the 
next  r diagonal  positions  being  w(12),  etc.  In  fact,  it  is  easy 
to  see  that  is  a probability  distribution  over  the  rst  cells . 

The  constraints  ^ = £ can  be  written  as 

= 1.  = 

The  elements  of  can  be  indexed  by  a subscript  t,  say.  It  is 
thus  sufficient  to  consider  the  problem  of  mihiraizing 
(A.l)  I(P:g  )=  P^  S-n  (P.^/: 
with  respect  to  the  constraints 

(A.  2) 

Note  that  C=  BW-^,  _P  = and  K = Wtr. 

Assume  now  that' the  rows  of  C are  linearly  independent.  There 
exists  a-unique  _P^  which  minimi zos  - (A-.  l-)— and  satisfies  - 

(A.  3)  P*  = jin  n + C'A 

where  £n  a denotes  (£n  a^,...,£n  ^ is  a vector  of  Lagrangian 

multipliers.  (See  Kullback  [1959]).  Let 

(A.  4)  C’’'=  C’(CCV)~^  and  R=  c'^C. 
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Then  equation  (A. 3)  is  equivalent  to 
(A. 5)  (I-R)  (in  P*  - in  n)  = 0. 

The  symmetric  and  i damp o tent  raatrix  R projects  vectors  of  dimension, 
equal  to  that  of  ^ onto  the  space  spanned  by  rows  of  C»  Let 

where  for  a vector  x,  x > 0 denotes  that  every  element  of  x is 

positive.'  Then  for  every  ^ s U,  C 9+(l-R)  2 is  a solution  of  (A.2),  . 

Conversely,  for  every  probability  vector  P which  satisfies  (A. 2) , 

■there  exists  a £ e U such  tliat  R - C 9.+  (J.“R)£'  first  assertion 

is  easy  to  verify  and  the  second  follows  by  setting  £ = ^ and 

* 

noting  that  C — RP.  Consider  (A.l)  as  a function  of  ^ defined 
over  U.  Then 

I(z)  > 0, 

•the  gradient  G(z)  of  ?(z)  at  z is 

(A. 6)  G(z)  = (I“R)  (tn£(z)  ~ to  ]I  ) 

and  the  Hessian  of  I at  z is 

(A.  7)  H(z)  - (I-R)  U(P(z)  ) 3“^(I“R)  , 

where  ^ (b)  denotes  a diagonal  matrix  wi-th  elements  of  vector  b in 
the  diagonal . 

Being  idempoten'fc,  ^'"R-  is  positive  defijnite, 

so  that  I is  a convex  function  of  z over  the  convex  set  U. 


-23- 


Thus  for  a satisfying  G(2^)  = 0,  I (2)  assumes  its  minimum  over 
U.  In  fact;  G(z^)  = 0 implies  that  the  corresponding 
satisfies  (A. 5)  in  view  of  (A.6) . 

At  the  s-th  iteration  the  algorithm  uses  a vector  _z_(s)  in  U 
and  the  corresponding  P (s)  ~ C ^+(^“R)z_(s)  . If 

(A .8)  l'G(s)  j—  [g[ 3 ) ] [ < £ ; 

where  e > 0 is  chosen  according  to  the  required  accuracy , the 

• ’4- 

procedure  xs  te3nnxnatad  and  ^ xs  set  equal  to  ?ts) . If  (A. 8) 
does  not  hold,  the  direction  D{s)  of  maximum  rate  of  decrease  in 
I(z)  at  ^(s)  is  obtained  by  no3rming  (-G(s))  . A positive 

constant  c{s)  , sufficiently  small,  is  then  found  such  that  xv'ith 
z{s+l)  ~ z_(s) +c  (s)  D (s)  and 

P (s+l)“^  (^-R)_z(s+1) 

(A. 9)  P (stl)  > 0 
and 

(A. 10)  I (s+1)  I(s)  . . 

The  .{s+1) -th  iteration  is  started  with  z_(s+l)  and  P(s+1).. 
One  way  of  finding  c(s)  is  to  first  set  it  equal  to  unity.  It 
is  repeatedly  doubled  until  one  of  (A. 9)  or  (A .10)  is  violated. 

If  (A. 9)  or  (A. 10)  do  not  hold  with  c(s)  = 1,  it  is  repeatedly 
halved  until  they  do. 

Consider  now  the  choice  of  ^(1)  and  P(l) . If  some  p >0  is 
kno^m  to  satisfy.  (A.  2)  , we  set  ^(1)  = z_(l)  = P*  if  not,  can  be 


24- 


found  easily,  though  by  trial  and  error,  by  sever-al  methods.  One 

“1 

method  is  to  compute  q(C)  = ^ ^ ^ positive 

/N 

probability  vector  and  set  p = xf  the  latter  is  positive. 

Another  method  is  to  check  whether  C 9.+  (l“R)5  is  positive  and,  if 
so,  set  it  equal  to  1^(1)  . Usually,  putting  ^ equal  .to  the  observed 
probability  vector  gives  the  desired  value  of  p.  In  .fact,  then 
c (5)  is  the  "minimum  modified  chi-square"  estimate  of  P subject  to 
(A. 2)  , which  minimizes  Z “ C.j.)  /^j-f  while  C __8t  (I-R)  minimizes 
the  Euclidean  distance  between  P and  As  such,’  these ^ serve  as 
good  starting  points  for  the  iterations . 

The  niimerical  computations  of  sections  3 and  4 were  programined 
in  APL/360. 
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